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In this paper, inference for the parametric component of a semi- 
parametric model based on sampling from the posterior profile dis- 
tribution is thoroughly investigated from the frequentist viewpoint. 
The higher-order validity of the profile sampler obtained in Cheng 
and Kosorok [Ann. Statist. 36 (2008)] is extended to semiparametric 
models in which the infinite dimensional nuisance parameter may not 
have a root-n convergence rate. This is a nontrivial extension because 
it requires a delicate analysis of the entropy of the semiparametric 
models involved. We find that the accuracy of inferences based on 
the profile sampler improves as the convergence rate of the nuisance 
parameter increases. Simulation studies are used to verify this theo- 
retical result. We also establish that an exact frequentist confidence 
interval obtained by inverting the profile log-likelihood ratio can be 
estimated with higher-order accuracy by the credible set of the same 
type obtained from the posterior profile distribution. Our theory is 
verified for several specific examples. 

1. Introduction. Semiparametric models have the form V = {Pe,n '■ (9, rj) £ 
G x 7i}, where C M rf and 7i is an arbitrary subset that is typically infinite 
dimensional. In this paper, interest will focus on the parametric component 
9, while the nonparametric component r\ will be considered a "nuisance 
parameter." Inference for 9 will be based on semiparametric maximum like- 
lihood estimation via the profile likelihood pl n {9) = sup^g^ lik n (9,r]), where 
lik n (9,r]) is the full likelihood given n observations. The maximum likeli- 
hood estimator for (9,r]) can be expressed as {9 n ,fj n ), where fj n = fj§ and 
f]Q = argmax ??gW /i/c n (0, rf). We will assume throughout this paper that eval- 
uation of pl n (9) is computationally feasible because of the availability of 
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procedures such as the stationary point algorithm (as used in [12], e.g.) or 
the iterative convex minorant algorithm introduced in [7], to find t)q when 
77 is a monotone function. 

Many of the advantages of using the profile sampler proposed in [14] for 
inference on 8 are discussed in [14]. The main argument is that direct maxi- 
mization of the full likelihood and direct computation of the efficient Fisher 
information function, which often requires tedious evaluation of infinite- 
dimensional operators that may not have a closed form, can both be avoided 
completely by using the profile sampler. This follows because the profile sam- 
pler yields a first-order correct approximation to the maximum likelihood 
estimator 9 n and consistent estimation of the efficient Fisher information for 
9, even when the nuisance parameter is not estimable at the yjn rate. 

Another approach to obtaining inference on 9 is through the fully Bayesian 
procedure, which assigns a prior on both the parameter of interest and the 
functional nuisance parameter. The first-order valid results in [21] indicate 
that the marginal semiparametric posterior is asymptotically normal and 
centered at the corresponding maximum likelihood estimator or posterior 
mean, with covariance matrix equal to the inverse of the efficient Fisher 
information. Assigning a prior on 77 can be quite challenging since for some 
models there is no direct extension of the concept of a Lebesgue dominating 
measure for the infinite-dimensional parameter set involved [13]. Comparing 
to the profile sampler procedure, this marginal approach does not circumvent 
the need to specify a prior on t], with all of the difficulties that entails. 
However, we can essentially generate the profile sampler from the marginal 
posterior of 9 with respect to a certain joint prior on ip = (9,rj), which is 
possibly data dependent. For example, we can use a gamma process prior on 
r\ with jumps at observed event times but not involving 9 in the Cox model 
with right censored data, see Remark 7 in [3]. 

The first-order validity of the profile sampler procedure established by 
[14] is extended to second-order validity in [4] when the infinite-dimensional 
nuisance parameter achieves the parametric rate. Specifically, higher-order 
estimates of the maximum profile likelihood estimator and of the efficient 
Fisher information are obtained in [4]. Moreover, [4] also proves that an ex- 
act frequentist confidence interval for the parametric component at level a 
can be estimated by the a-level credible set from the profile sampler with an 
error of order Op(n~ l ). Three rather different semiparametric models, the 
Cox model with right-censored data, the proportional odds model with right- 
censored data and case-control studies with a missing covariate, are studied 
in [4]. Such higher-order frequentist accuracy had not previously been estab- 
lished in semiparametric models for any other inferential approach, including 
the bootstrap. We note that this idea of higher-order accuracy is quite dis- 
tinct from the concept of second-order efficiency in semiparametric models 
(see [6, 8]) which we do not consider further in this paper. 
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A natural question is whether the second-order extension in [4] can be fur- 
ther extended to settings where the nuisance parameter has arbitrary con- 
vergence rates, in particular, rates that are slower than the parametric rate. 
This extension is the key purpose of this paper. Additionally, we generalize 
the results to allow for multivariate parametric components (only univariate 
components were permitted in [4]) and also show that another type of con- 
fidence interval for 9, obtained by inverting the profile log-likelihood ratio, 
can also be estimated with higher-order accuracy by the profile sampler. 

In this paper, the convergence rate for the nuisance parameter rj is de- 
fined as the largest r that satisfies \\fjg — r/o|| = Op(\\6 n — 9o\\+ n~ r ), where 
r/o is the true value of i] and || • || is a norm with definition depending on 
context, that is, for a Euclidean vector u, \\u\\ is the Euclidean norm, and 
for an element of the nuisance parameter space rj \\rj\\ is some chosen 
norm on TC. In regular semiparametric models, which we can define without 
loss of generality to be models where the entropy integral converges, r is 
always larger than 1/4. Obviously f/g i]q for any 6 n -^ 6q. We say the nui- 
sance parameter has parametric rate if r = 1/2. For instance, the nuisance 
parameters of the three examples in [4] achieve the parametric rate. More 
specifically, the nuisance parameter in the Cox model, which is the cumu- 
lative hazard function, has the parametric rate under right censored data. 
However, the convergence rate for the cumulative hazard becomes slower, 
that is, r = 1/3, under current status data. The result is not surprising since 
current status data cannot provide as much information as right-censored 
data. 

Obviously our results for r = 1/2 coincide with the results in [4]. It is 
also no surprise that the accuracy of the profile sampler is dependent on 
the convergence rate of the nuisance parameter. The precise error rate for 
many of the quantities we study is Op(M n (r)), where we define M n (r) = 
n -1 / 2 + n _2r+1 / 2 with support r > 1/4. Note that M n (r) increases in r over 
the interval 1/4 < r < 1/2 and is constant for r > 1/2. Although we cannot 
yet prove it, we conjecture that this error rate is sharp, in the sense that 
when the error is multiplied by M~ 1 (r), it converges to a nondegenerate 
random quantity as n — ► oo. 

Perhaps the most important new result in this paper involves a compari- 
son between an exact, frequentist confidence interval and a credible set for 
9 generated from the profile sampler. Specifically, we show that any rectan- 
gular credible set for 9 of level 1 — a based on the profile sampler is within 
Op(n~ 1 / 2 M n (r)) of an exact, frequentist, rectangular confidence region with 
coverage 1 — a. Note that the choice of a one-sided credible set at a given 
level is not unique when the parameter dimension is > 2. We also establish 
higher-order accuracy for the confidence interval obtained by inverting the 
profile log-likelihood ratio, defined as PLRj(9) = 2(logpl n (9 n ) — \ogpl n {&)). 
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The next section, Section 2, provides some necessary background ma- 
terial on semiparametric models, least favorable submodels, and empirical 
processes. The main concepts are illustrated with three examples which will 
be used throughout the paper. The primary assumptions required for the 
results of the paper are also presented, along with a key tool for obtaining 
rates of convergence. In Section 3, second-order asymptotic expansions of 
the log-profile likelihood are presented. In Section 4, we present the main 
result of the paper that the confidence interval for the parametric compo- 
nent of a semiparametric model can be approximated by the credible set 
based on the profile sampler with error of order Op(M n (r)). In Section 5, 
we establish that the required assumptions are satisfied for the three pre- 
viously introduced examples and present some simulation results. Section 6 
contains a brief discussion of future research directions, and proofs are given 
in the Appendix. 

2. Background and assumptions. We assume the data X\, . . . ,X n are 
i.i.d. throughout the paper. In what follows, we first briefly review the con- 
cept of a least favorable submodel. We then present three different examples 
for which we discuss the forms of the least favorable submodel and related 
model specifications. Next, we present the model assumptions needed for 
the remainder of the paper, and, finally, we give a key tool for the rate of 
convergence calculations needed in later sections. 

2.1. The least favorable submodel. The score function for 9, £g tV , is de- 
fined as the partial derivative w.r.t. 9 of the log-likelihood given n is fixed 
for a single observation. A score function for rjo is of the form 

d 



dt 



lo SPe , Vt (x) = A 9o ^ h(x), 
=o 



where h is a "direction" by which r] t £ 7i approaches t/q, running through 
some index set H. Ag >r) : H i— ► L^Pg^) is the score operator for n. The effi- 
cient score function for 9 is defined as £g^ = £g >r) — ILg iri £g >rl , where Ylg )rj £g^ 
minimizes the squared distance Pg^ilg^ — k) 2 over all functions k in the 
closed linear space of the score functions for rj (the "nuisance scores" ) . A sub- 
model 1 1— ► pt )Vt is defined to be least favorable at (9, rj) if lg^ = d/dt logp tj11t , 
given t = 9. The inverse of the variance of £g tV is the Cramer-Rao bound for 
estimating 9 in the presence of the infinite-dimensional nuisance parameter 
77, the efficient information matrix Ig „. We also abbreviate £g or)0 and Ig 0ym 
with £q and Iq, respectively. The direction h along which rjt approaches r\ 
in the least favorable submodel is called "the least favorable direction." An 
insightful review of least favorable submodels and efficient score functions 
can be found in Chapter 3 of [11]. 
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The least favorable submodel in this paper will be constructed in the 
following manner: We first assume the existence of a smooth map from the 
neighborhood of into the parameter set for rj, of the form t \— > rjt(9, 77), such 
that the map ii— > £(t,9,r])(x) can be defined as follows: 

(1) £(t,6,r ] )(x)=hglik(t,r h (9,r ] ))(x), 

where t and 9 are allowed to be multi-dimensional in this paper, although 
they both must have the same dimension, and where we require 770 (6*, 77) = rj 
for all (9, rj) £ x Ti. We will now illustrate the form of this map for several 
examples, and the remaining requirements for the map will be presented 
when the model assumptions are listed later on in this section. 

2.2. Examples. Three examples with different convergence rates are pre- 
sented in this subsection. The Cox model with right-censored data, which 
has a parametric convergence rate, has previously been studied in [4]. Never- 
theless, it will be useful to review this example briefly here, although most of 
the details are given in [4]. The second example, the Cox model with current 
status data, has a cube-root convergence rate for the nuisance parameter. 
The last example is the partly linear regression model with normal residual 
error, where the convergence rate of the nuisance parameter is n _2//5 under 
current status data. 

2.2.1. Example 1. The Cox model with right- censored data. In the Cox 
proportional hazards model, the hazard function of the survival time T of a 
subject with covariate Z is expressed as 

(2) \{t\z) = Hm ~^Pr{t <T <t + A|T >t,Z = z) = X(t) exp(9z), 

where A is an unspecified baseline hazard function and 9 is a vector including 
the regression parameters [5]. For the Cox model applied to right-censored 
failure time data, we observe X = (Y, 5, Z), where Y = T A C, 5 = I{T < C}, 
and Z G Z C M. d is a regression covariate. The cumulative hazard function 
A(y) = f$ \(t) dt is considered the nuisance parameter. The convergence rate 
of the estimated nuisance parameter is established in Theorem 3.1 of [17], 
that is, ||A^ - Aolloo = P (rr x l 2 + \\9 n - ^ID- 
Based on the model assumptions specified in Section 5.1 of [4], we can 
express the likelihood for (9,rj) in the following form: 

lik(6,A) = {e 9z A{y}e^ A ^) 5 {e- e9zA ^) 1 ~ 5 , 

by replacing X(y) by the point mass A{y}. Hence the score functions for 9 
and A can be easily derived as Iq^(x) = 5z — ze 9z A(y) and Ag t \h{y,5,z) = 
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Sh(y) — e 9z Jj y i hdA. Again, by the derivations in Section 5.1 of [4], the least 
favorable direction at (#, A), denoted hgj^, can be shown to be 

E 6A e ez Zl{Y > y} 

If we let ho denote the least favorable direction at the true parameters, the 
least favorable submodel £(t, 9, A) has the form £(t, 9, A) = log lik(t, A t (9, A)), 
where 1 1— > At(9, A) = A + (0 — t)ho- Note that we have tacitly swapped the 
notation for rj with A since A is more widely used in this context. 

2.2.2. Example 2. The Cox model with current status data. Current sta- 
tus data arises when each subject is observed at a single examination time, 
Y, to determine if an event has occurred. The event time, T, cannot be 
known exactly. If a vector of covariates, Z, is also available, then the ob- 
served data are n i.i.d. realizations of X = (Y, 5, Z) G R + x {0, 1} x R, where 
5 = I{T < Y}. The model of the conditional hazard given Z is the same as 
in the previous example. Throughout the remainder of the discussion, we 
make the following assumptions: T and Y are independent given Z . Z lies 
in a compact set almost surely and the covariance of Z — E(Z\Y) is positive 
definite, which guarantees the efficient information Jo to be positive definite. 
Y possesses a Lebesgue density which is continuous and positive on its sup- 
port [a, t] , for which the true nuisance parameter Ao satisfies Ao(cr— ) > and 
Ao(r) < M < oo, and this density is continuously differentiable on [a, r] with 
derivative bounded above and bounded below by zero. Under these assump- 
tions the maximum likelihood estimator of (9, A) exists, 9 n is asymptotically 
efficient and ||A n — Ao||l 2 = O p (n -1 / 3 ), where || • \\l 2 is the norm on L2([a, r]). 
Note that the conditions on the density of Y ensure that || A — Ao ||l 2 is equiv- 
alent to (/J(A(y) - A (y)) 2 dF Y (y)) 1 / 2 , where F Y (y) is the distribution of 
the observation time Y. Moreover, using entropy methods, [17] extends ear- 
lier results of [9], showing that ||A^ - A ||l 2 = Op(\\6 n - 9 \\ + n^ 1 / 3 ). It is 
not difficult to derive the log-likelihood 

n 

log !*„(«, A) = Y. 4 l°gP - exp(-A(y()exp(eZi))] 

(3) 

-(l-S i )exp(9Z i )A(Y i ). 
The score function takes the form Iq,k{x) = zA(y)Q(x;9,A), where 

0z \ s exp(-e e2 A(y)) 



Q(x;9,A) = e 



(1-6) 



1 -exp(-e ez A(y)) 

Inserting a submodel 1 1— > A t such that h(y) = —d/dt\t=oAt(y) exists for ev- 
ery y into the log likelihood and differentiating at t = 0, we obtain a score 
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function for A of the form Agj^h(x) = h(y)Q(x;9,A). The linear span of 
these functions contains Ag^h for all bounded functions h of bounded vari- 
ation. The efficient score function for 9 is defined as lg t \ = ^e,A — Ag^hg^A 
for the vector of functions hg a minimizing the distance P^aII^A ~~ ^6».A^|| 2 5 
which is also called the least favorable direction. The solution at the true 
parameter (9q,Aq) is ho(Y) defined as follows: 

y^h (y) = A (y)h 00 (y) 

(4) 

a ( ) Eg oAo (ZQ 2 (X;e ,A )\Y = y) 
0{y) Eg oAo (Q*(X;9 ,A )\Y = y) ' 

As the formula shows, the vector of functions ho(y) is unique a.s., and ho(y) 
is a bounded function since Q(x;9q,A ) is bounded away from zero and 
infinity. We shall assume the function y i— ► ho (y) given by (4) has a version 
which is differentiable with a bounded derivative on [a, r] . 

The least favorable submodel can be defined as £(t, 9, A) = log lik(t, At(9, A)), 
where A t (9, A) = A + (9- t)4>(A)(h 00 o A^ 1 ) o A, and </>(■) is a specially con- 
structed function that smoothly approximates the identity. The function 
At(9,A) is essentially A plus a perturbation in the least favorable direc- 
tion, /i , but its definition is somewhat complicated in order to ensure that 
At(9,A) really defines a cumulative hazard function within our parameter 
space, at least for t that is sufficiently close to 9. The details on the con- 
struction of the least favorable submodel can be found on page 23 of [16]. 

2.2.3. Example 3. Partly linear normal model with current status data. 
In this example, a continuous outcome Y, conditional on the covariates 
(W, Z) G R d x M, is modeled as Y = 9 T W -\- k(Z) +£, where k is an unknown 
smooth function, and £ ~ N(0,1). Note that the choice N(0, 1) is needed 
for model identifiability. We are interested in the regression parameter 9 
and consider k(-) to be an infinite-dimensional nuisance parameter. How- 
ever, the response Y is not observed directly, but only its current status 
is observed at a random censoring time CgR. In other words, we observe 
X = (C, A, W,Z), where A = l[y<c}- Additionally (Y,C) is assumed to be 
independent given (W,Z). Although it is not difficult to generalize to mul- 
tivariate 9, we restrict our attention to univariate 9 in what follows for ease 
of exposition. 

Under the partly linear model, the log-likelihood for a single observation 
at X = x = (c,5,w,z) can be shown to have the form 

log likg k {x) = <51og{<l>(c -9w- k(z))} 

(5) 

+ (1 - 5) log{l - $(c -9w- k(z))}, 

where <1> is the standard normal distribution. We further assume that the 
joint distribution for (C, W, Z) is strictly positive and finite. The covariates 
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(W,Z) are assumed to belong to some compact set W x Zct 2 . And the 
random censoring time C is assumed to have support [Z c ,u c ], where — oo < 
l c < u c < c<3. In addition, we assume -E[Var(VF|Z)] is strictly positive and 
Ek(Z) = 0. 

The regression parameter 9 is assumed to belong to some compact set 
in R , and the functional nuisance parameter k is assumed to belong to 
°¥ = if : Mf) + ll/lloo < M} for a known M < oo. The mth-order Sobolev 
norm of a function /, J m (f), is defined as J m (f) = [Jz(f^ m \z)) 2 dz] 1 / 2 . Here, 
m is a fixed integer and f^) is the jth derivative of /(•) with respect to z. The 
mth-order Sobolev class of functions is the class of functions / supported on 
some compact set on the real line with J m (/) < oo. Hence the class is 
trivially the subset of a second-order Sobolev class of functions, and k S O^f 
has known upper bound for both its uniform norm and its Sobolev norm. 
Note that the asymptotic behavior of penalized log-likelihood estimates in 
this model have been extensively studied in [15]. 

We now introduce the least favorable submodel. The score function for 9 
is £g t k = wQ(x; 9, k), where 

Q(X; 9, k) = (1 - A) _ A ^[ X \\ 

and qg k(X) = C — 9W — k(Z). Furthermore, by defining kt = k + th for 
h £ C?2 ) we can obtain the score function for k in the direction h: Ag^h(x) = 
h(z)Q(x;0,k). The least favorable direction h$ t k minimizes h i— ► Pe.k\\^e,k — 
Ae^h\\ 2 ■ By solving the equation P$,k{&6,k — Ag^fyAo^h = 0, we can obtain 
the solution at the true parameter values: 

Eo(WQ 2 (X;9,k)\Z = z) 
0[Z) E (Q 2 (X;9,k)\Z = z) ' 

where Eq is the expectation relative to the true parameters. Thus the least 
favorable submodel can be constructed as £(t, 9, k) = log lik(t, kt(6, k)), where 
k t (9,k) = k + (9-t)h . 

Note that the above model would be more flexible if we did not require 
knowledge of M . A sieved estimator could be obtained if we replaced M 
with a sequence M n — > oo. The theory we propose in this paper will be 
applicable in this setting, but, in order to maintain clarity of exposition, we 
have elected not to pursue this more complicated situation here. Another 
alternative approach is to use penalization. However, this is beyond the scope 
of the present paper. 

2.3. Assumptions. We now present the assumptions that will be used 
throughout the paper, along with some necessary notation. The dependence 
on x G X of the likelihood and score quantities will be largely suppressed for 
clarity in this section and hereafter. 
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For the vector V, matrix M and tensor T, the notation Vi, Mij and Ti j k 
indicate its ith, (i,j)th and (i,j,k)th element, respectively. M T represents 
the transpose of the matrix M. The derivative of the log-likelihood of the 
least favorable submodel is with respect to the first argument, t. The quan- 
tities £(t,6,rj), £(t,6,rj) and £^ 3 \t,9,r)) are separately the first, second and 
third derivative of £(t,9,rj) with respect to t. For brevity, we denote £q = 
£(9 ,9 ,rj ), £ = £(9 ,9 ,7] ) and = £ {3 \9 , 9 , rj ), where 9 , tj are the 
true values of 9 and rj. Of course, £o(X) can also be written as £o(X) based on 
the construction of the least favorable submodel. The quantity £^ (t, 9, rj) is a 
tensor. We thus define V T ®P£( 3 \t, 9, r})®V as a (f-dimensional vector whose 
ith element equals V T (d 2 / dt 2 ){P£{t, 9, rj))iV. Similarly V T ® (t, 9, rj) is 
a square matrix whose (z,j)th element is V T (d/dt)(d 2 /dtidtj)£(t,9,r]). We 
use £ tu tj,t k (t,9,r]) to denote (<9 3 /dtidtjdtk)£(t,9,r]). For the derivatives rel- 
ative to the other two arguments, 9 and 77, we use the following shortened 
notation: £g(t,9,rj) indicates the first derivative of £(t,9,rj) with respect to 
9. Similarly, £ t! g(t,9,rj) denotes the derivative oi£{t,9,rj) with respect to 9. 
Also, £t,t{9) and £t,e( r ]) indicate the maps 9 1— ► £(t,9,rj) and i] 1— > £t,o(t,0,T}), 

~l/2 

respectively. Let the random vector g n denote I (9 — 9 n ), and let <t>d{-) 
($d(")) represent the density (cumulative distribution) of a d-dimensional 
standard normal random variable (Nd(0,I)). The notations > and < mean 
> , or < , up to a universal constant. Define x V y (x Ay) to be the maximum 
(minimum) value of x and y. The symbols P n and G n = ^/n(F n — P) are used 
for the empirical distribution and the empirical process of the observations, 
respectively. 

We now make the following assumptions in order to achieve the desired 
second-order asymptotic expansions of the log-profile likelihood (15). The 
assumption A2 below guarantees that the least favorable submodel passes 
through (9,rj): 

Regular assumptions: 

Al. 6> eGcR d , where G is a compact and 9q is an interior point of O. 
A2. 770(0, rj) = rj for any (9, r))e@xH. 
A3. Iq is positive definite. 

We next describe the smoothness conditions for the least favorable sub- 
model. Clearly, the assumptions Bl and B2 below are separately the smooth- 
ness conditions for the Euclidean parameter (t, 9) and the infinite-dimensional 
nuisance parameter r\. In principle, these assumptions directly imply the no- 
bias conditions: 

P„i(0 O A,%„) = Kk + P (n- 1 / 2 + n- r + \\9 n -9 n \\) 2 , 

Fj(9 ,9n,Vg n ) = Pk + P (n-^ 2 + n- r + \\9 n -9 n \\), 
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for 9 n -^> 6q, thus making the profile likelihood behave like a standard para- 
metric likelihood asymptotically. 
Smoothness assumptions: 



Bl. The maps 



Ql+m 



have integrable envelope functions in L\{P) in some neighborhood of (#o, 9q, 770) 
for (l,m) = (0,0), (1,0), (2,0), (3,0), (1,1), (1,2), (2,1). 
B2. Assume: 

(6) G n (i(0 o A,%„) -4) = P (M n (r) + (n 1 ^ v l)\\9 n - 9 \\), 

(7) P£(9 , 9 , 77) - P£(9 , 9 , 770) = 0(\\rj - Vo \\), 

(8) Pi t ,e{ o, do, v) ~ P£t,e( o, Oo, Vo) = 0(\\ v - Vo \\), 

(9) Pi(9 ,9 , v ) = O(\\ v - V0 \\ 2 ), 

for 9 n #0 an d all 7/ in some neighborhood of 770- 

There are three approaches to verifying the smoothness assumption (6), 
which is essentially a continuity modulus of \G n £(9o, 9q, 77) — G n £o\- If the 
nuisance parameter has parametric convergence rate, we only need to show 
that the class of functions 

h,Qo,v) —to f ■ ■ , , f 

: for r\ m some neighborhood of 7/0 



Wv - r?o 11 

belongs to a P-Donsker class. Alternatively, if the nuisance parameter has 
the cubic rate, the continuity modulus of the empirical process turns out 
to be of the order Op(n^ 1 / 6 + n 1//6 ||6>„ — 6q\\), or equivalently Op(ra -1 / 6 + 
\\fjg — ?7o 1 1 1 ^ 2 ) - The method used to check this condition depends on the 
norm of the nuisance parameter and the bracketing entropy number of the 
class of functions Q = {£(9q, 9q, 77) for 77 in some neighborhood of 770}. When 
|| • || is the L2 norm or one of its dominating norms, we can make use of 
Lemma 5.13 in [22]. Another approach is to calculate the order of £?p||G n ||^-, 
where T= {(£(9 ,9 ,fj §n ) - £ )/(M n (r) + (n l / 2 ~ r V l)\\9 n - 9 Q \\)}, by the use 
of Lemma 3.4.2 in [23]. The last two methods will be respectively employed 
later on in verifying the assumptions for the second and third main examples. 

Boundedness of the Frechet derivatives of the maps 77 1— ► £(9q, 9q, 77) and 
771—^^,9(^0,^0,77) is sufficient to ensure validity of conditions (7) and (8). 
Note that P£t,e(Go,9o,Vo) = by the following analysis: Fixing 77 and differ- 
entiating P e>T] i(9,9,rj) relative to 9 yields P e ^Iq ^l{9 ,9 ,t]) t + Pe !V £{9,9,r]) + 
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(8/ (dt))\ t= gPg iTI £(8,t, rj) = 0, since Pq iT) £(6, 0, rj) = for every (0, rj), and since 
we can choose (9,rj) = (6q,i]o). One way to verify (9) is to write 



P£(9 ,9 ,r,)=P 



Po Pe ™ (1(00,00,*/) -i(0o,0o,qo)) 



Po 



h,e ,vo)( Pe °' v Po -A ( v - m ) 

V Po 



where Aq = Ag 0>vo and Aq v is the score operator for rj at (0,rj), for exam- 
ple, the Frechet derivative of logpg „ relative to rj. Thus, if the L2-norm or 
one of its dominating norms is applied to rj, it suffices to show, under the 
given regularity conditions, Frechet differentiability of rj i— ► £(9q, 9q, rf) plus 
second-order Frechet differentiability of 77 1 — > lik(6o,rj). Note that (9) is natu- 
rally satisfied for the semiparametric models with convex linearity, in which 
P£(9o,9o,ij) is exactly zero. 

Finally we assume that the following empirical process conditions hold 
for (t,9,rj) in some neighborhood of their true values: 

Empirical process assumptions: 

CI. There exists some neighborhood V of (00,00,%) in O x 6 x 7i such 
that the classes of functions {(£(t, 0, rj))ij(x) : (t, 0, rj) G V} and {(£ ti e(t, 0, rj))i t j(x) : 
(t,9,rj) G V} are P-Donsker and 

{(£^(t,9,v)) id , k (x):(t,9,v)eV} 

is P-Glivenko-Cantelli, for every i,j,k = l,...,d. 

One basic method of showing that a class of functions is P-Donsker or 
P-Glivenko-Cantelli involves calculating its (bracketing) entropy number. 
However this verification can be simplified by building up Glivenko-Cantelli 
(Donsker) classes from other Glivenko-Cantelli (Donsker) classes by em- 
ploying preservation techniques in Sections 9.3 and 9.4 of [11]. Also, every 
P-Donsker class T with integrable envelope function is P-Glivenko-Cantelli. 



2.4. Rates of convergence. The estimation accuracy of the profile sam- 
pler method depends mainly on the convergence rate of the estimated nui- 
sance parameter, that is, the value of r. We now present two useful results, 
Theorem 1 and Lemma 1 below, that are useful for determining this rate. 
These results are Theorem 3.2 and Lemma 3.3 in [17], and the proofs can 
be found therein. Theorem 1 is an extension from general results on M- 
estimators to semiparametric M-estimators with nuisance parameters. In 
Theorem 1, <i| (77, 770) m ay be thought of as the square of a distance, but it is 
also true for arbitrary functions 771— > dg(r),r)o)- Let (Cl,A,P) be an arbitrary 
probability space and T : 0, 1— ► R an arbitrary map. Then we use notations 
E*T and 0* P {\) to represent the outer integral of T w.r.t. P and bounded 
in outer probability, respectively (see page 6 in [23]). 
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Theorem 1. Assume for any given 9 € n , that fjg satisfies ¥ n mg t fj g > 
^■njnow f or given measurable functions x i— > mg tV (x). Assume conditions (10) 
and (11) below hold for every 9 G Q n , every r\ 6 V n and every e > 0: 

(10) P{m g>v - m em ) < -d$(r),r) ) + \\9-9 \\ 2 , 

(11) E* sup 0een>r?gVni || 6 ,_ 6 , o || <eide ( r)>r?o)<£ \G n (m dtT1 - mo tr]0 )\ < (j) n (s). 

Suppose that (11) is valid for functions cf) n such that 5 i— > <p n {5)/5 a is de- 
creasing for some a < 2 and sets n x V n such that P{9 S n , f)g G V n ) — ► 1. 
T/ien dg(fjg,rjo) < 0* P (5 n + 1 1 6* — 6*o 1 1 ) for any sequence of positive numbers S n 
such that (j) n (5 n ) < \fnb\ for every n. 

Lemma 1 below is useful for verifying the continuity modulus condi- 
tion (11) for the empirical process. Define Ss = {x ^ mQ trj (x) — mo jVo (x) : 
d e (r],r]o) <5,\\9 -9 \\ < 5} and 

(12) K(S, S S ,L 2 (P)) = + H B (e,S 5 ,L 2 (P)) de, 
where Hb denotes the log of the bracketing entropy number. 

Lemma 1. Suppose that the functions (x, 9, n) \— > mg t7) (x) are uniformly 
bounded for (9,r]) ranging over a neighborhood of (9 q,t]q) and that 

(1-3) P(m 8:V - m eom f < dg(rj,r) ) + ||0 - ^oll 2 - 

Then condition (11) is satisfied for any functions (fr n such that 

M S)>K(5,SsM(P))(l + ^§^). 

Consequently, we may replace 4> n (5) with K(5,Ss,L,2(P)) in the conclusion 
of the previous theorem. 

3. Second-order asymptotic expansion. In this section, second-order 
asymptotic expansions of the log-profile likelihood and maximum likelihood 
estimator are derived. Their second-order accuracy is proven to be depen- 
dent on the order of the convergence rate of the nuisance parameter through 
the rate function M n (r) given in the Introduction. Note that the smallest 
order of Op(M n (r)), Op{n~ 1 / 2 ), is achieved when the nuisance parameter 
has parametric or faster rate by the truncation property of the function 
M n (r). The assumptions in Section 2 are assumed throughout. 
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Theorem 2. If 0„ satisfies (0 n — 9 n ) = op(l), then 
(14) rfi(O n ~ e ) = -^=Y.I^%{Xi) + P (M n (r)), 



logpl n (0 n ) = logpl n (0 n ) - \n{0 n - n ) I o (0 n - Q n ) 

(15) „ . 

+ O P (g r (\\0 n -9 n \\)), 

where g r (w) = (nw 3 + n l ~ r w 2 + n l ~ 2r w + n~ 2r+1 ' /2 )1{1/ '4 < r < l/2} + (nw 3 + 
n-V2)i{ r >i/2}. 

Remark 1. Under regularity conditions, the counterpart of (14) in fully 
parametric models has error of order Op(n _1//2 ), which agrees with Op{M n (r)) 
when r > 1/2. Thus, we achieve the parametric bound in semiparametric 
models only when the nuisance parameter obtains the parametric rate. We 
also observe a monotonic increase in the error rate as r decreases toward 1 /4. 

The asymptotic quadratic expansion (15) can be used to construct an 
estimator of the standard error of 9 n . The estimator is the following "dis- 
cretized" version of the observed profile information matrix, I n , which is the 
derivative of the profile likelihood (see [17]): 

f/N Jogpl n (0 n + s n v) - \ogpl n {9 n ) 

(16) I n (v) = -2 , 

ns l n 

where direction v G M. d and step size s n — > 0. The expansion (15) implies 

(17) v T I v = i n (v) + Op(h r (\s n \)), 

where h r (\s n \) = g r (\s n \) / ns^ . By straightforward analysis, the smallest or- 
der of the error term in (17) is Op{n~ r ) by setting the step size s n = O p (n~ r ) 
and s" 1 = Op(n r ) when 1/4 < r < 1/2. However, when r > 1/2, the small- 
est order of error in (17) stabilizes at Op(re -1 / 2 ) by setting the step size 
to s n = O p (n~ 1 / 2 ) and s^ 1 = Op(n 1 / 2 ). In other words, I n can only be a 
sfn consistent estimator of Iq when the convergence rate of the nuisance 
parameter is faster than or equal to the parametric rate. 

The above analysis also leads to good discretized estimators for each ele- 
ment in Iq. For instance, with denoting the ith unit vector in M. d , we can 
deduce 

\ogpl n {9 n + ejS n + ejS n ) + log pl n {0 n ) 



( J n(e))ij 



(18) 



nsl 



log pl n (0 n + ejS n ) + logpl n (0 n + ejs n ) 



nsl 



(19) (I )ij = (Ue))^ + P (h r (\s n \)). 
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4. Main results and implications. We now present the main results on 
the posterior profile distribution. Let Pq\x De the posterior profile distribu- 
tion of 9 with respect to the prior p(9) given the data X = (X±, . . . ,X n ). 
Define A n (9) = n~ l {\ogpl n (9) — logpl n (9 n )} . We now present the first main 
result: 

Theorem 3. Assume the assumptions of Section 2 and also that 

(20) A n (9 n ) = op(l) implies that 9 n = 9q + op(l), 

for any sequence 9 n 6 8. If proper prior p(9o) > and p(-) has a continuous 
and finite first-order derivative in some neighborhood of 9q, then 

(21) sup |P ff |x(VniJ /2 (0 - n ) < £) - M0\ = P (M n (r)). 

Remark 2. Based on the conclusions of Theorem 3, we know that 
the [1 — a + Op(M n (r))]th one-sided and two-sided credible sets for vec- 
tor 9 from the profile sampler are (—oo,9 n + n -1 / 2 / -1 / 2 'Zi- a ] and [9 n — 
n~ 1 l 2 I~ 1 l 2 z 1 _ a /2,9 n + n^^I^^zi-a^}, respectively, where Zq, IS 3j stan- 
dard normal ath quantile for (i-dimensions and / can be either Jo or I n . 

The following two corollaries provide several interesting additional second- 
order properties of the profile sampler: 

Corollary 1. Assume the conditions of Theorem 3, and let f n {~) be 
the posterior profile density of y/ng n relative to the prior p(9) . Then 

(22) fn(0 = M0 + O P (M n (r)). 

Corollary 2. Under the conditions of Theorem 3 and recalling that 
g n = Iq {9 — 9 n ), we have that if 9 has finite second absolute moment, then 

(23) 9 n = E 9ljt (9) + Opin-^Mnir)), 

(24) Jo = n- 1 (Var 0| ^(0))- 1 + P (M n (r)), 

where E e ^(9) and V&r g ^(9) are the posterior profile mean and posterior 
profile covariance matrix, respectively. 

Remark 3. The posterior moments in Corollary 2 are with respect to 
the posterior profile distribution. Thus we can estimate 9 n with the mean 
of the profile sampler. Similarly, (each element of) the efficient information 
matrix can be estimated by (the corresponding element of) the inverse of the 
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covariance matrix of the profile sampler with an error of order Op(M n (r)). 
Clearly, a faster convergence rate of the nuisance parameter leads to higher 
estimation accuracy. We can generalize the arguments used in the proof of 
Corollary 2 to obtain general results on the posterior moments. For simplic- 
ity, assume 9 is one dimensional. Then, provided \9\Pp(9)d9 < oo, we 
h&veE e]jte P = n^/^t//' + Op(n-C+ 1 )/ 2 + n(- 2r + 1 )-^ 1 )/ 2 ), where E Q ^ n 
is the /3th posterior moment of g n and U ~ iV(0, 1). 

Remark 4. We now have two approaches to estimating the efficient in- 
formation matrix Iq. One approach is by numerical analysis as given in (19). 
Another approach is presented in Corollary 2 as an estimate from the poste- 
rior distribution. We prefer estimating Iq with (24) using the profile sampler 
procedure in semiparametric models with r > 1/2 since this avoids the issue 
of choosing the step size in (17) or (24). However for models with r < 1/2, 
the numerical differentiation approach may be worthwhile because of the 
smaller error rate that may be obtained using (17). 

Combining (14) and (23), we obtain 

1 n 

ME e \jt(e) -0o) = ^J2 ^{Xi) + P {M n {r)). 
V" i=l 

The range of r implies that the mean value of the profile sampler is essentially 
a semiparametric efficient estimator of 9 even when the nuisance parameter 
has a slower convergence rate. A similar conclusion appears to hold for 
other estimators of 9 n based on the profile sampler, including multivariate 
generalizations of the median. 

The second main result is expressed in the following Theorem 4. An ath 
quantile of the posterior profile distribution is any quantity r na £ W 1 that 
satisfies r na = inf{£ : Pq\x(9 < £) > a}, where £ is an infimum over the given 

set only if there does not exist a £i < £ in R d such that Pg\x{9 < £i) — a - 
Because of the assumed smoothness of both the prior and the likelihood in 
our setting, we can, without loss of generality, assume Pg\x{9 < T na ) = a. 

We can also define K na = \fn{r na — 9 n ), that is, P g ^(y/n(9 — 9 n ) < K na ) = a. 
Note that neither r na nor n na are unique if the dimension for 9 is larger than 
one. Nevertheless, the following theorem ensures that for each choice of n na 
there exists a unique k na based on the data such that P(^/n(9 n — 9q) < 
k na ) = a and \\k na - K na \\ = P (M n {r))\ 

Theorem 4. Under the conditions of Theorem 3 and assuming that 
lo(X) has finite third moment with a nondegenerate distribution, then there 
exists a k na based on the data such that P(^/n(9 n — 9q) < k na ) = a and 
kna — Kna = Op(M n (r)) for each chosen K na - 
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Remark 5. Clearly, a faster convergence rate of the nuisance parameter 
leads to a more accurate estimate of the confidence interval when 1/4 < r < 
1/2. The profile sampler procedure can provide the best estimate for the 
boundary of the confidence interval in semiparametric models when r > 1/2. 
We conjecture that the product of s/nl{r > 1/2} + n 2r ~ 1/2 I{l/4 < r < 1/2} 
and the Op(M n (r)) term in Theorem 4 converges to the product of two 
different nontrivial but uniformly integrable Gaussian processes. Thus we 
believe the convergence rate in Theorem 4 is optimal. 

Theorem 4 states that the Wald-type confidence interval can be approx- 
imated by the credible set of the same type based on the profile sampler 
with error of order Op{M n {r)). In other words, the boundary of a one-sided 
confidence interval for 9 at level a can be estimated by the ath quantile 
of the profile sampler with error of order Op(n~ lj/2 M n (r)). Similar conclu- 
sions also hold for the confidence interval obtained by inverting the profile 
likelihood ratio, as will be shown in Theorem 5 below. 

The profile likelihood ratio in the frequentist and Bayesian set-up is sep- 
arately defined as PLRj(9q) = 2(logpl n (9 n ) — logp/ n (#o)) and PLRb{9) = 
2(\ogpl n (9 n )-\ogpl n (9)). Thus xT is defined W XT = inftf : P el x(PLR b (9) < 
£) > a}. As argued previously, we can, without loss of generality, assume 
that P e ^(PLRf,(6) < Xb") = a - The following theorem ensures that there 
exists a x] a based on the data such that P(PLRf(9o) < X/°0 = a an d 
X n f a -Xr = Op(M n (r)): 

Theorem 5. Under the conditions of Theorem 4, there exists a x 1 }" 
based on the data such that P(PLRf(9o) < x 1 }") = ct and Xf a — Xb a = 
P (M n (r)). 

Remark 6. The corresponding a-level confidence interval and credi- 
ble set obtained by inverting the profile likelihood ratio can be expressed as 
Cf a (X) ={9£Q: PLR f (9) < X r } a } and C^ a {X) = {9 £ G : PLR b {9) < xT)i 
respectively. Moreover, the proof of Theorem 5 implies that Xb° = Xda~^~ 
Op{M n {r)) and Xb a = x\ a + Op(M n (r)), where x\ a denotes the ath quan- 
tile of central chi-square distribution with degree of freedom d. Theorem 5 
also implies that P 9 ^(PLRb(9) < Xda) = a + Op(M n (r)). Thus it appears 
in this instance that not much is gained by using the posterior profile sam- 
pler to calibrate the likelihood ratio confidence interval instead of simply 

usin g Xd,a- 

5. Examples. This section illustrates the practicality of the stated con- 
ditions by verifying that these assumptions are satisfied for each of the three 
examples introduced in Section 2. Some simulation results about the Cox 
regression model are also presented. 
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5.1. The Cox model with right- censored data. Note that this example 
was considered fully in [4], but we include some of the main ideas here for 
completeness. We first verify the smoothness conditions Bl and the empir- 
ical processes assumptions CI. Under regular conditions, Bl can be eas- 
ily satisfied since the maps 0,0,7?) i — > {d l+m /dt l 9 m )£(t,9,r]), whose forms 
can be found in [4], are uniformly bounded around (9q,9q,Aq). Notice that 
the functions y t— > ho(y), y t— > A t (y) and z t— > exp(zt) for (t,9,A) in the as- 
sumed neighborhood of the true values are P-Donsker. Thus we can verify 
CI by repeatedly employing the Lipschitz continuity preservation property 
of Donsker classes. The remaining smoothness conditions B2 and condi- 
tion (20) are separately verified by Lemmas 2 and 3 of [4]. 

5.2. The Cox model with current status data. In this section we verify 
the regularity conditions for the Cox model with current status data as well 
as present a small simulation study to gain insight into the moderate sample 
size agreement with the asymptotic theory. 

5.2.1. Verification of conditions. We can verify that £(t,8,A) defined in 
Section 2.2.2 above is indeed the least favorable submodel since £(t,9,A) = 
(zA t (9, A)(y) - (j)(A(y))hoo o Ag 1 o A(y))Q{x; t, A t {9, A)), evaluated at t = 9 = 
9q and A = Ao, is the efficient score function (zAo(y) — ho(y))Q(x; 9q, Aq). 
Note that we extend the domain of the function u ^ Aq 1 (m) to all of [0, oo) 
by assigning the value a to all u E [0, A(a)) and the value r to all u > A(r). 
Substituting 9 = t and A = A t (9, A) in (3) and differentiating with respect 
to t and 9, we obtain, 

£(t, 9, A)(x) = (zA t + A t )Q(x; t, A t ), 



where 

d 2 lik(t,A t (9,A))/dt 2 



lik(t,A t (9,A)) 
Note that £{t,9,A) can be written as follows: 



Q(x; t, A t ) x [z 2 A t + 2A t - e tz (zA t + A t 



£(t,9,A) 



A t (9,A)(y)Q(x;t,A t (9,A)), 



and the map u \— > ue~ u / (1 — e~ u ) is bounded and Lipschitz on [0, oo). Thus 
we can write A(y)Q(x;t, A) = ip{e tz , A(y)), where the function ip is bounded 
and Lipschitz in each argument. Next, note that 

A t _ <MA)/ioo o Aq 1 o A _ (0(A)/A)/i oo o A 1 o A 



A t A t (9,A) l + (9-t)( ( p(A)/A)hoooA 1 oA' 
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Combining this with the facts that the function <p(y)/y is bounded and 
/ioo ° Ag 1 i s bounded by assumption, we obtain that £(t,6,A) is bounded. 
Clearly, £{t, 9, A) is also uniformly bounded based on the following equation: 

d 2 Uk(t,A t (6,A))/dt* 

= Q{x;t, A t )A t 



lik{t,A t (9,A)) 



z 2 + 2±-e t *A t (z* + 2z^+(^ 
A* V A* \At 



By similar analysis, £ t ,e(t,6,A), £^(t,9,A), £ t ^ e (t,6,A) and £ tt e,e(t,0,A), 
whose concrete forms can be found in [3] , are also uniformly bounded for all 
t sufficiently close to 6 and all A varying over the parameter space. 

We next verify assumption CI. Recall that A(y)Q(x; t, A) = ip(e tz , A(y)), 
where the function tp is bounded and Lipschitz in each argument. Thus, 
since the classes of functions z t— > e tz and y i— > A(y) are Donsker, so is the 
class of functions x i— > A(y)Q(x; t, A). Note that 

0(A) ?(A) 



A t (6,A) 1 + (0 - t)s(A)v(A) 



X(A), 



where ^(A) = <f>(A)/A and ^(A) = Hqq o Aq 1 o A, and both ?(A) and v (A) are 
Lipschitz according to the assumptions. Hence x(A) is also Lipschitz in A. 
Thus the class of functions £(t, 6, A) with (t, 6) varying over a small neighbor- 
hood of (#O)0o) and A ranging over all nondecreasing cadlag functions with 
domain [<r, r] and range [0, M] can be seen to be a Donsker class. By repeated 
application of the above techniques to (d 2 lik(t,A t (6,A))/dt' 2 )/lik(t,A t (8,A)) 
we know the class of functions £(t, 9, A) is also Donsker. Similarly, the classes 
of functions £ t> o(t,6,A) and £^(t,6,A) with (t,6) varying over a small neigh- 
borhood of (#cb$o) an d A ranging over all nondecreasing cadlag functions 
with domain [a, r] and range [0,M] can be shown to be Donsker. More- 
over, £( 3 \t,6,A) is automatically P-Glivenko-Cantelli since it is uniformly 
bounded based on the previous analysis. The following lemmas verify the 
remaining assumptions: 

Lemma 2. Under the above set-up for the Cox model with current status 
data, assumption B2 is satisfied. 

Lemma 3. Under the above set-up for the Cox model with current status 
data, condition (20) is satisfied. 



5.2.2. Simulation study. In this subsection, we conducted simulations in 
two semiparametric models with different convergence rates, that is, Cox 
regression with right-censored data and Cox regression with current status 
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data. The contrast of the above two simulations agrees with our theoretical 
results that the accuracy of inferences based on the profile sampler is higher 
in semiparametric models with faster convergence rates. 

In what follows, the simulations are run for various sample sizes under 
a Lebesgue prior. For each sample size, 500 datasets were analyzed. The 
event times were generated from (2) with one covariate Z ~ C7[0, 1] . The 
regression coefficient is 9 = 1 and A(t) = exp(i) — 1. The censoring time 
C~ U[0, t n ], where t n was chosen such that the average effective sample size 
over 500 samples is approximately 0.9n. For each dataset, Markov chains 
of length 20,000 with a burn-in period of 5,000 were generated using the 
Metropolis algorithm. The jumping density for the coefficient was normal 
with current iteration and variance tuned to yield an acceptance rate of 
20%-40%. The approximate variance of the estimator of 6 was computed 
by numerical differentiation with step size proportional to n -1 / 2 (n~ 1//3 ) for 
right-censored data (current status data) according to (16). 

Table 1 (Table 2) summarizes the results from the simulations of Cox 
regression with right-censored data (current status data) giving the aver- 
age across 500 samples of the maximum likelihood estimate (MLE), mean 
of the profile sampler (CM), estimated standard errors based on MCMC 
(SEm), estimated standard errors based on numerical derivatives (SEn) and 



Table 1 

Cox regression with right- censored data (8q — 1 and 500 samples) 



n 


n|MLE — CM| 


Vn|SE M -SE N | 


ti|Lm — Ln 


n|U M -U N | 


50 


0.3062 


0.2270 


0.1809 


1.1212 


100 


0.2587 


0.0311 


0.5987 


0.1301 


200 


0.3218 


0.0279 


0.4810 


0.5253 


500 


0.2017 


0.2080 


0.7524 


0.3518 






Table 2 








Cox regression 


with current status data 


(0 o = 1 and 500 


samples) 


n 


n 2/3 |MLE-CM| 


n 2/6 |SE M -SE N | 


ti 2 ' 3 |Lm — Ln| 


n 2/3 |U M -U N | 


50 


0.4438 


0.6144 


3.1799 


5.6550 


100 


0.6506 


0.4071 


0.6162 


1.0082 


200 


0.7729 


0.3284 


0.4617 


0.8071 


500 


0.7559 


0.1611 


0.1071 


1.3103 



n, sample size; MLE, maximum likelihood estimator; CM, empirical mean; SEm, esti- 
mated standard errors based on MCMC; SEn, estimated standard errors based on numer- 
ical derivatives; Lm (Um), lower (upper) bound of the 95% confidence interval based on 
MCMC; Ln (Un), lower (upper) bound of the 95% confidence interval based on numerical 
derivatives. 
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boundaries for the two-sided 95% confidence interval for 9 generated by 
numerical differentiation and MCMC. Lm (Ln) and Um (Un) denote the 
lower and upper bound of the confidence interval from the MCMC chain 
(numerical derivative). According to (17), Corollary 2 and Theorem 4, the 
termsn|MLE-CM| (n 2 / 3 |MLE-CM|), vra|SE M -SE N | (wV^SEm - SE N |), 
u|Lm — Ln| (n 2 / 3 |LM — Ln|) and ra|UM - Un| (™ 2//3 |Um — Un|) for Cox re- 
gression with right censored data (current status data) in Table 1 (Table 2) 
are bounded in probability. And the realizations of these terms summarized 
in Tables 1 and 2 clearly illustrate their boundedness. Furthermore, we can 
conclude that the profile sampler based on the semiparametric models with 
faster convergence rate yields more accurate inferences about 9. 

5.3. Partly linear normal model with current status data. By differenti- 
ating the least favorable model with respect to t or 9, we can obtain 

l(t,9,k) = Q(x;t,k t )(w - h (z)), 



£(t,9,k) = (w-ho(z)f<f> t (1-5) 
w - h (z))h (z)4>t 



£^ 3) {t,9,k) 
ttfi,o{t,9,k) 



(1 - $ t )q t - <fn 
(1 - $t)qt 



6 



qt$t + 



(1-5) 



(l-*t) 



.gt$t + & 
1 *? 



(w-ho(z)) (f) t R(qt{x)), 

(w - h (z)) 2 h (z)(j) t R(q t (x)), 

(w - h (z))hl(z)(j) t R(q t (x)), 



where 



R(qt(x)) 



[1-5) 



qi 



24>tqt 



+ 
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(i-^tY 



<jj 



1 | 3&ft | 2# 

$, $2 $3 



qt = qt,k t (s,k)(x), 4>t = 4>(qt), and <&t = The convergence rate for the 

estimated nuisance parameter is established in Lemma 4 by application of 
Theorem 1. The rate r = 2/5 is clearly faster than the cubic rate but slower 
than the parametric rate. Note that Oif is a P-Donsker class by technical 
tool Tl in the Appendix. Assumption CI can be verified easily by recogniz- 
ing that the three classes of functions specified in CI depend on (t, 9, k) in a 
Lipschitz manner and are uniformly bounded. The remaining assumptions 
are verified in Lemmas 5 and 6 below. 



Lemma 4. Under the above set-up for the partly linear normal model 
with current status data, we have for 9 n — > 9q . 



(25) 



kin 



k 



0\\L 2 



P [n 



-2/5 



+ 
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Lemma 5. Under the above set-up for the partly linear normal model 
with current status data, assumptions Bl and B2 are satisfied. 

Lemma 6. Under the above set-up for the partly linear normal model 
with current status data, condition (20) is satisfied. 

6. Future work. It is clear that the estimation accuracy for 8 in the pro- 
file sampler method is intrinsically determined by the semiparametric model 
specifications, specifically by the convergence rate of the nuisance parame- 
ter. Therefore it is very natural to raise a question about how to control the 
degree of accuracy. One potential strategy is to profile the penalized likeli- 
hood, whose penalty term is some norm on the nuisance parameter space 
such as the Sobolev norm. We expect that we can adjust the estimation accu- 
racy of the proposed penalized profile sampler by tuning the corresponding 
smoothing parameter. We believe that under certain special model specifi- 
cations, third or higher order semiparametric frequentist inference can be 
constructed by extending the Bartlett correction [1] and objective prior [24] 
results to semiparametric settings. There is a rich literature on the higher 
order properties of posteriors for parametric models and the choice of the 
prior; see, for example, [10, 19, 20]. 

APPENDIX 

Proof of Theorem 2. We first prove (14). Note that 

o = Fj{e n , e n , fin) = fJ(9 , 9 n ,fj n ) + fJ(0 o , e n ,f, n ){e n - e ) 

+ 30n- Oof ® Pr/ 3) (ft A, Vn) ® (k " #o), 

where 8 n is in between 8q and 9 n . By considering Lemma 2.1 below, we can 
derive the following: 

n 

= n-^iote) + Pio0n ~ 00 ) + n- 1 / 2 / o A 4 „(0o, k, fin) and 

V^(6 n -60) = —^ I^toiXi) + A An (9 ,9 n , Tj n ), 
v n i= i 

where 

A in (8 ,8 n ,f) n ) = y/nI ~ 1 Ai n (8 ,8 n ,fj n ) + y/nI ~ l A 2n (8o, n , fj n )(9 n - 9 ) 

+ \yfijfc\0 n - 8 ) T ® Pr/ 3) (C On, Vn) ® [k ~ 6 ) 

and Ai n and A2 n are defined in the proofs of Lemma 2.1, respectively. The 
orders of magnitude of Ai n (9o,9 n ,fj n ) and A2 n (9o, On, Vn) obtained in the 
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proofs of Lemma 2.1 imply that the order of magnitude of A 4n (0 o , 9 n , Vn) is 
Op(M n (r)), as desired. 

We next show (15). By (30) in Lemma 2.1 below, we have 

n 

\ogpl n {9 n ) = logpl n (9 ) + (9 n - OofY^hiXi) 

, x »=1 

(30) also implies that 

\og P l n {6 n ) = logpl n 0n) + {K - 9nf ( E^oPQ) - nl o n - 9 )) 



\i=l 



n 
2 



(#n — n ) T Io(6 n — 9 n ) + A^ n (9o,9 n ,r]^) — A^ n (9o,9 n ,r] n ). 



Define A 5n (0nA) = logpZ n (£ n ) - logpZ„(0 n ) + (n/2)(0„ - 9 n ) T I {9 n - 9 n ). 
By considering (26), we can obtain the respective upper and lower bounds 
of A^ n {9 n ,9 n ) as follows: 

^in = -Vn(6 n - 9 n ) T i A in (9o,9 n ,fi n ) 

+ A^ n (e ,9 n ,fj §n ) - A^ n (9 ,9 n ,fj n ), 
= -Vn(9 n - 9 n ) T I A 4n (9 ,9 n ,fj n ) 
+ A% n {9 ,9 n ,rj § J - A^ n (9 ,9 n ,fj n ), 

where A^ n and A 3^ are defined in the proof of Lemma 2.1 and also shown to 
have magnitude Op(g r (\\9 n — 9 n \\)) . Now the assumptions in Section 2 imply 
that A^ n (9 n ,9 n ) and A^ n (9 n ,9 n ) are of order Op(g T (\\9 n — 6 n \\2)), and the 
proof is complete. □ 

Lemma 2.1. Assuming the conditions of Theorem 2, we have 

(28) Fj(9 , 9 n , %J = Fj + P (M n {r) + \\9 n - 9 n \\f , 

(29) FJ(9 ,9 n ,fjs) = Pl + P {M n {r) + || 

@n @n ||)) 
n 

\ogpl n {9 n ) = logpl n (9 ) + (9 n - 9 ) T Y / UXi) 

(30) 

- - 0o) T h{9n - 9 Q ) + P (g r (\\9 n - 9 n \\)) 
for any random sequence 9 n — 9 n 0. 
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Proof. By Taylor expansion of 9 i— > P£(9q, 9, t)q ), we obtain: 

p£(e , e n , fj L ) = pi(e ,e ,f) §n ) + P£ t , e (e , e , %j{e n - e ) 

+ \(e n - 0o) T ® P(t,e,d(0o, eifjgj ® n - 9 ) 

= Ai (0 O , 0n,%J> 

where 9\ is intermediate between 9 n and 9$. The assumptions in Section 2 
imply Ai(0 o , 9~n,f)§ n ) has order P (M n (r) + \\9 n - 9 n ||) 2 . By writing G n (£(6 , 
8n,Vg n ) ~ to) as the summation of G n (£(9o, 9 n , fjg ) — £(9o,0Q,fig n )) and 
G n (^(#cb ^ ) ~~ ^o)i we obtain that the difference between P n £(9 ,9 n ,r]Q ) 
and P n ^o is 

Ai n ((9 , 6*n, %J = Ai(0 O) O n ,V § J + n- 1 / 2 G ft ^ |fl (0 o , 02,V Sn )(6„ - 6 Q ) 

+ n- 1 / 2 G n (£(9 ,9 ,?] §n )-£ ), 

where 9% is intermediate between 9 n and 9q. The order of magnitude of 
Ai n (#O)0n>^ ) follows from the assumptions (6) and CI. This completes 
the proof of (28). 

By similar analysis, we obtain 

P£(9 ,9 n , fj § J = P£ + A 2 (9 , 9 n , % n ) 

and 

Wj(6 , 9 n , f)6 n ) = P*0 + A 2n (&oA, ) , 

where A 2 (0 O , 6*n, % n ) = (0„ - O ) T ® P^ A e (0 O , ^ , Vg n ) + Wo , #0 , fjg n ) " Pto] 
and A 2n (0o, n , flej = A 2(#o, #n, %J + n~" 1/2 G n ^(0 o , 0„, %J, and where 0| 
is in between 9q and 9 n . The assumptions in Section 2 now yield the desired 
order of magnitude in (29). 

Next, we will show (30). Note that 

ra - 1 (logpZ n (0 n )-logpZ n (0o))=P^(^,^ I % Ti )-Pn^^O^O J ^o)- 

The right-hand side of the above equation is bounded below and above 
by P n (£(9 n ,ip n ) — £(Oo,ip n )), where the lower and upper bounds separately 
correspond to ip n = (6o,f)g ) and (0™,% ). By applying a three-term Tay- 
lor expansion to both upper and lower bounds, we obtain the correspond- 
ing upper bound, A^ n (9o,9 n ,fjg ), and lower bound, A^ n (9o, 9 n , fj§ ), for 

A3n(0O) On-, Vg n ) defined as follows: 

n 

A 3n (0 o A,%J = log P l n (9 n ) - logpl n (9 ) - (On - 9 ) T Y / to(X i ) 

i=l 

+ ±n(9 n -9 ) T I (9 n -9 ), 
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where 

= n(6 n -6 ) T A ln (6 ,6 n ,fi § J 

d d d 

+ Z E E E ^nhi,t h t k (0t,0 n , Vg n )(Qn ~ #o)i(#n - )j{9 n - 6o)k, 
i=lj=lk=l 

= n(6 n -6 ) T A ln (e ,6 ,fi ea ) 

d d d 

+ S E E E P rA,^ fc (et,6 Q ,fje )(6n - )i(0n - e )j(§ n - e ) k , 
D i=l j=lfe=l 

where #4 and #5 are in between #0 and (28) and (29) yield the order of 
A^ n (0 , 9 n , rj Sn ) and Af„ (0„ , # n , 77^ ) , which is O p (g r ( 1 1 § n - 8 n \ \ ) ) . □ 

Proof of Theorem 3. Suppose that F n (-) is the posterior profile dis- 
tribution of y/ng n with respect to the prior p($), where the vector Q n is 

~l/2 

defined as I (9 — 6 n ). Let the parameter set for g n be E n . The whole proof 
of Theorem 3 can be briefly summarized in the following expression: 

p (0 = / en g(- c,n-^]n5„P(gn + /o gn) ^ ^gn 
f ^ 4- f~ 1/2 n \ P l «( § « +i q 1/2 ^) d 

Note that above is the short notation for d^ n i x • • • x dg n d- We first 
partition the parameter set H n as {S n fl {||gn|| > r n}} U {^n. H {||gn.|| < r n}}- 
By choosing the proper order of r n , we find the posterior mass in the first 
partition is of arbitrarily small order and the mass inside the second par- 
tition region can be approximated by a stochastic polynomial in powers of 
ra" 1 / 2 with an error of order dependent on the convergence rate of the nui- 
sance parameter. This general approach applies to both the denominator 
and numerator, yielding a quotient series that leads to the desired result. 
Before giving the formal proof, we need two intermediate lemmas: 
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Lemma 3.1. Choose r n = o(n 1 ^ 3 ) such that \fnr n — > oo. Under the con- 
ditions of Theorem 3, we have 

/ni\ f (a i ?-l/ 2 \Pln0n + Io ^ 6n) , ^ / -M\ 

(31) / p{O n + I Qn) —TjTs dg n = P {n ), 

J\\en\\>r n pln{Vn) 
for any positive number M. 

PROOF. Fix r > 0. We have 

~— 1/2 

p{6 n + I Q n ) — d£ n 

^n||>r pln{Vn) 

</{A;<-n- 1 / 2 }exp(-^) / ^)^ + /{A;>-n^/2 }) 

where = su P | bn | |>r A n (0 n + f" 1 / 2 ^). By Lemma 3.2 in [4], /{A^ > 

— ?i -1 / 2 } = Op(n~ M ) for any fixed r > 0. This implies that there exists a 
positive decreasing sequence r n = o(n -1 / 3 ) with y/nr n — > oo such that (31) 
holds. □ 

Lemma 3.2. Choose r n = o(ra -1 / 3 ) such that \fnr n — > oo. Under the con- 
ditions of Theorem 3, we have 



(32) 



Wen \\<r n 



pl n {9 n + Iq 1/2 Qn, 



P{6n + Iq Qn) ~ exp ( —-^QnQn ) p(&r, 



pln(&) 



ri 



dQ n 



= P (n- 1 / 2 M n (r)). 
Proof. The posterior mass over the region ||^n|| — is bounded by 



(*) 



\\o n \\<r„ 



pl n 0n + lo Qn) 



Pln(0n) 



n 



p(9 n ) -exp( --QnQn )p{0 r . 



dQr, 



+ 



(**) 



lkn||<rn 



pln{0n + k 1/2 Qn) f-1/2 



Pln(0n) 



pln(0 n + I Q 1/2 Q n ] 



pln(0n) 



P(0n) 



Using (15) when r > 1/2, we obtain 



(*) 



Qn \\<r„ 



p(9 n ) exp 



nQnQn 



dQ n - 



eMOp{n\\ Qn \f + n- 1 / 2 ))-l 



dQn 



26 



G. CHENG AND M. R. KOSOROK 



n 



-1/2 



u n \\<^/nr n 



p(6 n ) exp 



T 



x |exp(n- 1 / 2 (|| Un || 3 + l)0 P (l))-l| 



du r 



rT x x P (1) x 
P (n- 1 ), 



\\u„\\<^nr„ 



p(9 n )exp 



T 



where the second equality follows by replacing \fnq n with u n , and the 
third equality follows from the fact that | exp(n _1 / 2 (||ii n || 3 + l)Op(l)) — 1| = 
P (l)n^ 1 / 2 x (||u n || 3 + 1), since n^^WuJ 3 = o(l). 
However, when 1/4 < r < 1/2, we obtain 



Il0n||<rv 



Qn\\<rr, 



p(6 n )exp 



p(9 n )exp 



nQnQn 



nQnQn 



\exp(0 P (g r (\\g r , 



IMUnW) x O p (1)\ 



1| 



dg n . 



dg r , 



When 1/4 <r < 1/3, Op(^(lkn||)) = Op(n 1_2r ||^ n || + rT 2r+1 / 2 ). Note that 

n 2r ~ 1-5 satisfying r n = o(n~ 1 / 3 ) with 
+ n - 2r+l ' 2 = o(l) 



l-2r| 



there exists a <5 > such that 
y/nr n — > oo for any 1/4 < r < 1/3. Therefore n 

when r n is taken equal to ra 2r ~ 1 ~ <5 for some 5 > 0. In this case, it implies 
that g r {\\Qn\\) = « 1_2r ||0n|| + n" 2r+1/2 for 1/4 < r < 1/3. However g r (\\Qn\\) = 
9r(\\Qn\\) for 1/3 < r < 1/2 since gy(||£n||) = o(l) when r is in this range. 
Combining with previous analyses, we have (*) = Op(n~ 2r ) for 1/4 < r < 
1/2. Summarizing the above analysis, we have (*) = Op(n _1 / 2 M n (r)). 

By the following analysis of (**), we will be able to show that (**) = 
Op(n _1 ) for r > 1/2 since exp(Op(n\\g n \\ 3 + n^ 1 / 2 )) = Op(l) with ||g n j| < 



(**) 



'\\Qn\\<rn 
<M 



\p{K) T h ^Vlexp 



n 



Qn \\<r n 



n T 



2 Q n Qn + P (n\\g n \\ 6 + ri 



dg r , 



-1/2) 



dg n 



x sup exp(Op(n\\g n \\ 3 + n 1//2 )), 
||e«||<r„ 



— 1/2 

where Q* n is intermediate between 9 n and 6 n + I Q g n . 

In the case that l/4<r<l/2, we have the same conclusion: 



(**) 



||en||<Tn 



\piKfh 1/2 £-n|exp( -^glg n + P (g r (\\g n \\)) 



ir 



dg n 
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< 



llfti||< r 
+ 



n T 

|£n||exp[ --g n g T 



11 



dg r , 



g n \\exp[-^g^g n )\exp(Op(g r (\\g n \\))) - l\dg n 

\\Qn\\<r n V ^ 



< P {n~ l ) + P {n- 2r - 1/2 ) = P {n~ l ), 



—1/2 

where Q* n is an intermediate value between 6 n and 6 n + I g n . The last 
inequality follows from the analysis of (*) when 1/4 < r < 1/2. Hence we 
have proved that (**) = Op(n _1 ) for r > 1/4. This completes the proof. □ 



Next we start the formal proof of Theorem 3. Note first that 



P{»n + i Q Qn) — Y\ 



dg n 



{||en||>r„}ns„ 

+ 



n(0 4- T- 12 n ^n^n+Ip g r > 
P\Pn + ^ Qn) 77T\ 



dg n 



L 



en\\<r n }r\E n 



n(ft 4- f-Wn \P l n0n + I O V2 Qn 
P{"n + lo Qn)- 



Pln{ 



dg n . 



By Lemma 3.1, the first integral on the right is of order Op(n~ 1 ^ 2 M n (r)). 
The second integral on the right can be decomposed into the following sum- 
mands: 



gn\\<rn}nS„ 

+ 



fQ i r-V 2 \P l n{6n + I Qn) n T 

p[Pn + h Qn) 7-7^ exp[—-g n g n )p{9 r , 

pln\P) V 1 



dg r , 



{||en||<r„}ns r 



11 



exp( ~QnQn ) P^n 



dg n . 



The first part in the above is bounded by Op(n 1 / 2 M n (r)) via Lemma 3.2. 
The second part equals 



n~ 1/2 p(9 n ) 



-«£«n/2 dUr 



{ 1 1 «n 1 1 < \fnr-n } D y/nE 
T 



n^/ 2 p(6 n ) / e~ u ^l 2 du n + 0(n^/ 2 M n {r)), 



where u n = y/ng n . The above equality follows from the inequality e y2 ^ 2 dy < 
x~ 1 e~ x ' 1 1 2 for any x > 0. 

Consolidating the above analysis, we obtain 
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(33) 



/ „ 



P\Pn + ^0 On) — J— 

plniPn) 



dg n 



= n^piQ^fl 2 + P (n- l / 2 M n (r)), 
and, by similar analysis, we also have 



(34) 



e„6(-oo,n- 1 /2^]nB Tl 



n(ft -L r l l 2 n \ P l ^n + k ^ Qn) 
p[p n +lQ Q n ) 



dg r 



Pln{0n) 

- yTy/2 dy + P (n- l / 2 M n {r)). 



x-x(-°o,£d] 

The quotient of (33) and (34) generates the desired error rate for fixed £. 
Note, however, that the above conclusions are unchanged if £ is replaced by 
an arbitrary sequence {£„} G R d . Thus (21) follows, proving Theorem 3 in 
its entirety. □ 

Proof of Corollary 1. Prom the proof of Theorem 3, we have 



P el x(V^fo /2 (0-dn)<0 



/ e „e(-oo,n-V2|]nH n P(°n + -^0 Qn 



1/2 -\pl„(6 n +I 1/2 Qn) 



plnifin) 



dg n 



-1/2 s pi n {e n +i 1/2 Q n ) 
By differentiating both sides relative to £ and combining with (33), we obtain 



/n(0 



/0(^n + 



Pln(9n) 



(2ir) d /*p(8 n ) + Op(M n (r)) 

By analysis similar to the proof of Corollary 2 in [4] , the numerator equals 
p(9 n ) exp(-£ T £/2) + P (M n (r)). This completes the proof. □ 

Proof of Corollary 2. We only show (23) in what follows. Expres- 
sion (24) can be shown similarly. The expansion in (23) is the quotient of 
two expansions of the form (33) and (34). We can show this as follows: First, 



f ID I r^ 1 / 2 \Pln(9n+I n Qn) i 

f en es n enP(0n + I o g n ) pi J n) d 6n 

[ n (f) +r 1/2 n \ P^( § «+^ 1/2 en) , 
J en es n P^n + l Qn) dg n 
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The denominator is n 

V2( 27r )d/2 p (0 n ) + Op ( ra -i/2 Mn ( r )) by (33). Similarly, 
by the proof of Theorem 3 we know the numerator is a random vector of 
the order Op(n _2r_1 / 2 + ra -3 / 2 ). This yields the desired conclusion. □ 

Proof of Theorem 4. By Lemma 4.1 in [4], we can easily show that 

~— 1/2 

Kna = Iq z a + Op{M n {r)) for any £ < a < 1 — £ and some choice of z a , 
where £ G (0, 1/2). Note that K nQ , is not unique since the ath quantile of a 
(i-dimensional standard normal distribution, z a , is not unique when d > 1. 
The classical Edgeworth expansion implies that i^n" 1 / 2 £"=i / ~ 1/2 4(X;) < 
•2a + a n (a)) = a, where a n (a) = 0(n -1 / 2 ), for £ < a < 1 — £. This a n (a) 
is thus uniquely determined for each fixed z a since £o(Xi) has at least 

- -i/o 

one absolutely continuous component. Let k na = I z a + {\fn(9 n — 9 ) — 

n~ 1/2 lZ =1 % 1 £o(X i ))+% 1/ *a n (a). Then P(^E(0 n - 9 ) < k na ) = a. Com- 
bining with (14), we obtain k na = K na + Op(M n (r)). The uniqueness of k na 
follows from that of a n (a) for each fixed z a , up to a term of order Op(M n (r)). 
□ 

Proof of Theorem 5. Under the assumptions of Theorem 5, we next 
show that Xb a = X% a + P (M n (r)) for f < a < 1 - £, where £ G (0, 1/2). It 
is sufficient to show that P e ^(PLRi > (9) <Xda) = a + Op(M n (r)) by consid- 
ering the form of PLR b {6) and (22). Based on the analysis in the proof of 
Theorem 2, the term Op(g r (\\9 n — 9 n \\)) in (15) is actually bounded above by 
A^ n (9 n ,9 n ) and bounded below by A^ n (9 n ,9 n )- Thus it yields the inequality 
that n(0 - 9 n ) T h{9 - 9 n ) - A^ n (0, 9 n ) < PLR b {9) < n(0 - 9 n ) T I {9 - 9 n ) - 
A§ n (9,9 n ) such that we have constructed the upper bound and lower bound 

forP eljt (PLR b (9)<xl a )- 

We next show that the upper and lower bound matches asymptotically 
with a + Op(M n (r)). Without loss of generality, we only consider its upper 
bound in what follows: 

P eljc (n(0 - n flo(9 - 9 n ) < x 2 d , a + A^(M„)) 
<Pe\x( W n) + Pmx(\\Qn\\>r n ) 



~— 1/2 

f{\ M >r n }n~ n p(8n + I Qn) -rfcs dg n 



k n P(0n + h Qn) d Qn 

<P el x(W n ) + P (n- M ), 

where r n = o(n 1 / 3 ) with y/nr n -> oo, W n = {ng^g n < x\ a + A% n (9,9 n )} n 
{||f?nll — r n} 5 an d M is an arbitrary positive number. The third inequality 
above follows from Lemma 3.1 and (33). 
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We next study P e ^(W n ). Accordingly, 



P 9]ji (W n ) = — : pln[9 i /2 

k n P{0n + I O Qn) d Qn 

_ I Wn Wn + ^ 1/2 Qn Y ln{§ ;tX) en) ~ P( § ») exp(-§^ gn )] d Bn 

~ ra- 1 / 2 /3(^n)(27r) rf / 2 + P {n~ l / 2 M n {r)) 

Iw n P0n) exp(-f QlQn) dQ n 

n- 1 / 2 p( y e n )(2Tr) d / 2 + P {n- 1 / 2 M n {r)) 



+ 



= P (MJr)) + - 

n- 1 / 2 p(e n )(2Tr) d / 2 +Op(n~ 1 / 2 Mn(r)) 

_ Iy n P0n) exp(-fg^gn) dQ n + f Wn _ Vn p{0 n ) exp(-|g^g n ) dg n 
n- 1 / 2 p(9 n )(2TT) d / 2 + P {n- l / 2 M n {r)) 
+ P {M n (r)) 

= a + V-V^(^)exp(-f^ gra )^ 0p(Mn(r)) , 
n- 1 /2p(6/ n )(2 7 r) d / 2 + Op(n" 1 /2 Mn (r)) 

where = {ng^Qn < a }- The third equality in the above follows from (33) 
and Lemma 3.2 in the proof of Theorem 3. We next study the fraction in the 
last equality above. It is easy to show that {W n — V n } C {W n — V n r\{\\Q n \\ < 
r n }} Q {Xd, a < n Q T nQn < x\ a + A^ n (0, n )} D {\\g n \\ < r n } = T n . By replacing 
^/ngn with u n , T n can be reexpressed as {x\ a < Un u n < x\ a + A^ n (0, n 
{\\u n \\ < Vnr n }- 

We next consider the order of J w _ v p(6 n )exp(—nQnQ n /2)dQ n for r in 
different ranges. For r > 1/2, A^ n (6,0 n ) = Op(n~ 1 / 2 + n _1 / 2 ||ti n || 3 ). Under 
the condition that ||u n || < \/nr n , A^ n (6,8 n ) = op(l). Hence any subsequence 
of u n contained in T n is not diverging. In this case, A^ n (6,6 n ) = Op(n~ 1 / 2 ). 
In summary we have the following inequalities: 

/ p(0 n )exp( —QnQn ) dQn < / p(9 n ) exp ( -~QnQn ) dg n 

JWn-Vn \ 1 J JT n \ 2 J 

^Opin- 1 ), 
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where Q n = {x\ a < u T n u n < X \ a + P {n 1 I 2 )} n {\\u n \\ < ^r n }. Hence 
P el x(W n ) = a + Opin" 1 / 2 ) when r > 1/2. 

Similar arguments will now be applied to the case 1/4 < r < 1/2. It is 
sufficient to show that J w v p(6 n )exp(—^g^Q n )dg n = Op(n~ 2r ) for r in 
the above range. When 1/3 < r < 1/2 and ||n n || < \/rir n , A^ n (9,9 n ) con- 
verges to zero in probability since A§ n (9, 9 n ) = Op (n" 1 / 2 \\u n || 3 + ra _r ||u n || 2 + 
n 1/2 ~ 2r |K|| +n" 2r+1 /2). Consequently, A^ n (9,9 n ) is P (nV2-2r) by the 
analysis we used for the case when r > 1/2. However, for 1/4 < r < 1/3, 
A^ n (9,9 n ) = Op (n _2r+1//2 1| u n 1 1 + n~ 2r+1 / 2 ). By making the same choice of 
r n used in the proof of Lemma 3.2, we have A§ n (9,9 n ) = op(l). Hence 
there does not exist a diverging subsequence of u n contained in T n when 
we choose this specific r n . This implies that A% n (9,9 n ) = Op(n l l 2 ~ 2r ). In 
other words, A% n (9,9 n ) = Op(n 1 / 2 ~ 2r ) for 1/4 < r < 1/2. This implies that 
fw n -V n P( ®n) ex P(~^8nQn) dg n = Op(n~ 2r ). The same arguments also ap- 
ply to the lower bound of P g ^(PLR b (9) < x\ a )- Thus we have shown that 
X™ = Xd,a + P (M n (r)) for £ < a < 1 - £, where £ € (0, 1/2). 

If we can show that X/ a = Xd Q + Op(M Jl (r)), then the whole proof is 
complete. Combining (14) and (15) in Theorem 2, we can rewrite PLRf{9o) 
as n" 1 E"=i ^o(X i ) T / - 1 E"=i ^o(^) T + Op(M„(r)). By classical Edgeworth 
expansion, we have P{ n - 1 / 2 £™ =1 I^ x,2 lo(Xi) <z a )=a + C^n" 1 / 2 ), which 
directly yields P{n^ ££=i lopQ)^- 1 £™ = i *o(*i) T < x| Q + 0(n~ 1 /2)) = Q> 
Thus Xf a = Xda + Op(M n (r)). This completes the proof. □ 

Proof of Lemma 2. We first review some known results from [18] 
about the Cox model with current status data. For some constant C and 
every x (under the assumed regularity conditions), we have 

|Kfc(0o,Ao)(x) - lik(9 ,A)(x)\ < C\A(y) - A (y)\, 

(35) \£(9 , 6 , A) (x) - £(9 ,9 ,A )(x)\ < C\A(y) - A (y) | , 

| lik (9 ,A)(x) - tik(0 o ,A o )(x) - A (A - A )(x) lik {9 ,A )(x)\ 
<|A(y)-A (y)| 2 , 

where Aq = Aq ^ ano - ^-8 A is the score operator for A at (9, A), for example, 
the Frechet derivative of logp^A relative to A. Thus by the decomposition 
of P£(9q,9q,A) in what follows, we can show (9) holds with the L2 norm 
on A: 

>0 - Pe ,A , 



P£(9 ,9 ,A) = P 



o,0o, A) -£o) 



Po 

P6 ,A ~ PO 



P£(9 ,9 ,A ) 



Po 



A (A-A ) 
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Next we can show (7) by the following inequality: 
P(*(0o,0o,A)-£(0 o ,0o,Ao)) 

< p\N{e ,e ,A) - n(6 , e , a )| + p\i 2 (e , e , a) - i 2 (e , e Ql a )| 

<P\N(9 ,9 ,A)-N(9 ,9 ,A )\+C\\A-A \\ L2 , 

where N(t, 9, A) = (d 2 lik(t, A t (6, A))/dt 2 )/lik(t, A t (6, A)). The second inequal- 
ity follows from the boundedness of £(t,6,A) and (35). We next proceed to 
derive an upper bound for P\N(9 ,0 ,A) - N(9 ,9 , A )|: 

P\N(9 Q ,9 ,A)-N(9 ,9 Q ,A Q )\ 

< P\AQ(x; 9 ,A) - AQ(x; 9 , A )| + P\A - A | + P\w(A) - w(A )\ 
+ P\w(A)A-w{A )A \ 

< P\AQ(x; 9 ,A) - AQ(x; 9 , A )| + ||A - A ||l 2 

< ||A-Ao||L a , 

where w(A) = 0(A)/ioo ° Aq 1 o A. Clearly, w(Aq) = Note that w(A) can 
be expressed as A$(A)v(A), where ?(A) = 0(A)/A and v(A) = h Q0 o Aq 1 o 
A. Note that <r(A) and v(A) are both assumed bounded and Lipschitz. 
Hence P\w(A) - w(A )\ < \\A - A || L2 and P\Aw(A) - A w(A )\ < \\A - 
Ao||l 2 . This explains the second inequality in the above. The inequality 
that P| AQ(x; #0) A) — AQ(x; 9q, A )| < ||A — A ||l 2 follows from the inequal- 
ity that \(u(e u — e"))/((e" - l)(e v - 1))| < \u — v\ given that u > in some 
compact set and v > in some compact set. Combining this with the previ- 
ous analysis, we can conclude that (7) holds under the given assumptions for 
the Cox model with current status data. Similar techniques can be applied 
to the verification of (8). We omit the details. 

Finally, we only need to check (6). Note that G n (£(9o, 9q, Aq ) — £q) can 
be written as follows: 

& n ((A §n - A )zQ(x; 9 , A )) - G n ((w(A § J - w{A ))Q(x; 9 , A )) 
(36) +& n (Q(x;9 ,AgJ(zA §n -w(A §n ))-Q(x-,9 ,A )(zA -w(A ))) 
- G n {{(zAg n - w(A § J) - {zA - w{A )))Q{x; 9 ,A )). 

To verify (6), we need to make use of the following technical tools: 

Clearly, by Lemma 5.13 in [22] we have G n ((A^ — Aq)zQ(x; 9q, Aq)) = 

Op{n~ l l & + ||Ag — Aoll]^ 2 ), since a = 1 for monotone functions A. Then by 

the relation P (\\A §n - AqW^) = P ((\9 n - 9 \ + n" 1 / 3 ) 1 / 2 ) = Op^" 1 ^ + 

7 T- 1//6 |6'n — ^o|)i we know that the first line of (36) satisfies (6). Note that the 
class of Lipschitz functions of A also has the same upper bound (i.e., a = 1) 
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for the entropy with bracketing number by Theorem 2.7.11 in [23] and the 
inequality that N(e,Q,\\ • ||l 2 ) < M.i(2e,^,|| • ||l 2 )- This now implies that 
G n ((w(A § J - w(A ))Q(x;6 ,A )) = Opin- 1 / 6 + n 1 / 6 \6 n - 6 \) since w(A) is 
Lipschitz in A. Similar arguments apply to the other lines in (36). □ 

Proof of Lemma 3. The proof is analogous to that of Lemma 2 in [14]. 

□ 

Proof of Lemma 4. We apply Theorem 1 with mg^ = A(0, k), where 
X(9,k) = log lik(9, k), since lik(9,k) is bounded away from zero and infin- 
ity for (6,k) G G x Of ' . It suffices to show (10) provided both P(X(9,k ) - 
X{0o,k o )) > -\\9 - O \\ 2 and P(X(0,k) - X(0 o ,k o )) < -d 2 e (k,k ) hold. Note 
that the maximality of the point (9q, ko) around the criterion function (6, k) i— ► 
PX{9,k) implies that P(X(9,k ) - X(9 ,k )) > -\\6 - 9 \\ 2 . By using the 
inequality Plog(q/p) < —h 2 (p,q) and the relationship between Kullback- 
Leibler divergence and squared Hellinger distance h, we can show that 
P(X(9,k)-X(9 ,k )) < - J(y/Pe^k-^) 2 dfi< -\\pe,k-Po\\l 2 - Hence d (k, k ) = 
\\pe,k — Po\\l 2 - Thus we only need to verify condition (11) by Lemma 1 to 
complete the whole proof. 

Condition (13) in Lemma 1 trivially holds by considering the forms of m$ k 
and dg(k, ko). By Theorem 1, we can show that dg (kg , ko) = Op(5 n + \\9 n — 
9o\\) for any 5 n satisfying K(5 n ,Ss n , L 2 (P)) < \/n£ 2 , where the function K 
is defined in (12). In other words, we need to calculate the e-bracketing en- 
tropy number for the class of functions S$ n . To achieve the desired rate (25), 
we only need to show Hb(£,Ss„, L 2 (P)) < e~ 1 / 2 based on the above discus- 
sions. Recall that Ss n ={ih X(9,k)(x) — X(8, ko)(x) : dg(k, ko) < 6 n ,\\9 — 
$o|| < <^n}- By considering Lemma 9.24 in [11], we only need to show that 
H B (e,C,L 2 (P)) < e~ 1/2 , where C = {x ■-> X(9,k){x) : J 2 (k) + \\k - fc ||oc < 
Ci, \\9 — 9o\\ < Ci}. 

Now we consider C\ = {qg t ] c (x)/(l + J(k)) : \\k — &o||oo < Ci? ||# — #o|| < Ci}- 
By technical tool Tl below, we obtain Hs(e,Cx, L 2 (P)) < e -1 / 2 as desired. 

Tl. (See [2].) For each < C < oo and 5 > we have 

(37) H B (5, {r] : WtjW^ <C,J k ( V )<C},\\- Woo) < (f)^- 

Continuing, note that X(9,k)(X) can be rewritten as: 

(38) Alog$(qe >k A) + (1 - A)log(l - $(qg tk A)), 

where A = 1 + J(k) and qg t k G C\. We next calculate the e-bracketing en- 
tropy number with the L 2 norm for the class of functions R\ = {k a (t) : 1 1— > 
log & (at) for a > 1 and t G T}, where T is some bounded subset in M 1 . Note 
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that k a (t) is increasing (decreasing) in a for t > (t < 0). After some deriva- 
tion, we obtain that sup tgT \k a (t) — k(,(t)\ < \a — b\ for any fixed a, b > 1 and 
sup a b>A teT \ka(t) — kb(t)\ < Aq 1 . The above two inequalities imply that the 
e-bracketing number with uniform norm is of order 0(e~ 2 ) for a € [l,e _1 ] 
and is 1 for a > e^ 1 . Thus we know Hb(s, R±, L2) = 0(loge~ 2 ). By applying 
a similar analysis to R2 = {k a (t) :t 1— > log(l — $(ai)) for a > 1 and t S T}, 
we obtain that -ffe(e, i?2j -^2) = 0(loge~ 2 ). This combined with Lemma 
15.2 in [11], yields that Hb(e,C,L2) < e~ x l 2 . Thus far we have shown that 
dg (kg , fco) = Op(n -2 / 5 + \\9 n — 9q\\). Now by the usual Taylor expansion and 
the assumption that EVar(W\Z) is positive definite, we have verified (25). 
□ 

Proof of Lemma 5. Note that k G 0<f . Hence we can easily verify as- 
sumption Bl since every map (t,9,k) 1— > (d l+m / dt l d9 m )£(t, 9, k) is uniformly 
bounded. Note that (C, W) lies in some bounded set and /lo(-) is bounded. 
Hence we can show that the Frechet derivatives of k 1— > £(9q, 9q, k) and k ^ 
^t,e(9o,9o, k) for any k £ O^f are bounded operators, that is, \£(9o, 9q, k)(X) — 
Iq (X) I is bounded by the product of some integrable function and | k — k$ \ (Z) . 
Thus (7) and (8) are satisfied, and the bounded Frechet derivative of k 
l(9o,9o,k) plus second-order Frechet differentiability of k 1— ► lik(9o,k) im- 
plies (9). 

Since the convergence rate r = 2/5, it suffices to show the asymptotic 
equicontinuity condition (6), provided (39) holds. Accordingly, 

(39) G n (i(0 o ,0o, k Sn ) ~ 4)) = Op(n" 3/1 ° + n 1/1O ||0 n - O ||). 

To show (39), we need the following technical tool T2: 

T2. (Lemma 3.4.2 in [23].) Let T be a class of measurable functions such 
that Pf < 5 2 and H/lloo < M for every / in T. Then 

EMG n y < K(6,F,L 2 (P)) (l + * ( *'^ (P)) m) , 

where K(6,F, \\ • ||) = / 5 y/1 + H B (e,F, \\ • ||) de. 
To utilize this tool, note first that (25) implies: 

/ i(9 ,9 ,kg )-£q \ 2 ... 
P[ , y - <Op(n~ 1/5 ). 

We next define the set Q n as 
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for some 5 > 0. Obviously the function (£(6 ,6 , k § J -l )/(n^ 3 / w + n 1 / w \\6 n - 
#o || ) G Qn on a set of probability arbitrarily close to one, as C n — > oo. If we 
can show linin^oo -E*||G n ||g n < oo by T2, then Lemma 5 is proved. 

Note that £(6q, 0o, k) depends on k in a Lipschitz manner. Consequently we 
can bound Hb(£, Q n , p2(P)) by the product of some constant and H(e,lZ n , 
L 2 (P)), where TZ n is defined as {G n (k) : J{G n {k)) < ?i 3 / 10 , \\G n (k)\\oo < ?i 3 / 10 }, 
and where G n (k) = &i/(n _3//10 + n 1//10 ||(9 — II)- By the main results in [2], we 
know H(e,K n ,L 2 (P)) < (n^ 10 /e)^ k . Note that 5 n = n" 1 / 10 and M n = n 3 / 10 
in T2. Thus by calculation using T2, we establish lim n _ >00 E* \\& n \\Q n < oo- 
□ 



Proof of Lemma 6. By the assumption that A n (9 n ) = op(l), we have 
A„(# n ) — A„(6 l o) > op(l). Thus the following inequality holds: 



n 



i=l 



H{9 n ,kn ;Xi) 



H(9 .,kg ;Xi 



>op(1), 



where H(6, k; X) = A$(C - 0W — k(Z)) + (1 - A)(l - $(C -6W- k{Z))). 
By the assumptions on k, we know that H(9 n ,kQ ;Xi) belongs to some 
P-Donsker class. Combining the above conclusion and the inequality a log x < 
log(l + a{x — 1}) for some a £ (0, 1) and any x > 0, we can show that 



(40) 



Plog 



1 + a 



H{9n,kg ;X{ 



>o P (l). 



H{0 , kg ;Xi) 

The strict concavity of x i— ► log(l + a(x — 1)) ensures that 

'H{^n,kg ;Xi) 



Plog 



1 + a 



H(9o, ko ;Xi) 
This combined with (40) implies that 

'H(8 n ,kg ;Xi) 



Plog 



1 + a 



1 



< 0. 



OP (l). 



^(^0 5 ^6» ;^i 

The strict concavity of x i— > log(l + a(x — 1)) forces the result that P|$(C — 
(9 n W r -% n (Z))-$(C-6» Ty-^ )(Z)| =o P (l). The desired conclusion now 
follows from model identifiability. □ 
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